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I.  INTRODUCTION 


A.  PURPOSE  AND  SUMMARY  OF  RESULTS 

The  purpose  of  this  study  is  to  develop  the  logistic  regression  alternative  for 
.estimating  attrition  rates  using  length  of  service  and  grade  as  carrier  variables.  It  would 
be  most  useful  if  the  regression  coefficients  showed  temporal  stability  and  were  not 
highly  dependent  upon  the  occupational  specialty.  It  is  hoped  that  this  development 
can  enhance  previously  developed  understanding  of  the  attrition  process  as  it  affects 
the  United  States  Marine  Corps  officer  manpower  data. 

Unfortunately  the  logistic  regression  approach  to  this  problem  does  not  improve 
upon  estimators  developed  by  earlier  workers.  See  Table  8  on  page  30.  It  does, 
however,  contribute  to  the  understanding  of  the  attrition  process  as  it  relates  to  length 
of  service  and  grade.  The  partial  regression  coefficients  can  serve  in  ad  hoc  calculations 
to  indicate  the  direction  of  change  and  to  make  rough  estimates  of  the  amount  of 
change.  These  coefficients  do,  however,  change  in  more  than  small  ways  as  one  cha 
changes  the  military  occupational  specialty.  See  Table  7  on  page  24.  The  aviation 
community  especially  appears  to  possess  coefficients  quite  different  from  those  of  other 
communities. 

B.  BACKGROUND 

The  first  step  in  any  manpower  planning  should  be  a  good  description  of  the 
system  or  organization.  Such  can  allow  us  to  get  reasonable  forecast  values.  Forecasts 
should  never  be  interpreted  as  what  will  happen  but  as  central  estimates  of  what  could 
happen  if  the  assumed  trends  continue.  They  therefore  provide  a  guide  for  management 
action  required  to  achieve  a  desired  objective.  Also,  good  forecast  values  depend  upon 
finding  efficient  ways  to  estimate  attrition  rates.  In  other  words  the  description  of  the 
the  system,  attrition  rates  and  forecasting  are  each  dependent  on  one  another. 

The  forecasts  made  by  manpower  planning  models  are  affected  by  three  general 
factors;  existing  inventory,  projected  losses  and  projected  gains.  In  order  to  project  the 
inventory  into  various  future  time  periods  it  is  necessary  to  forecast  the  future  values 
using  a  realistic  system  of  flow  rates. 

Estimation  techniques  for  the  USMC  officer  attrition  rates  have  been  developed 
by  Major  D.D. Tucker  in  a  thesis  [Ref.  1]  submitted  at  the  Naval  Postgraduate  School 


in  September  1985,  and  further  by  Major  John  R.  Robinson  in  a  thesis  (Ref.  2] 
submitted  at  the  Naval  Postgraduate  School  in  March  1986.  They  used  James-Stein 
and  other  shrinkage  type  parameter  estimator  schemes  for  the  purpose  of  generating 
stable  manpower  loss  rates.  The  reader  is  referred  to  Tucker  [Ref.  1]  and  Robinson 
[Ref.  2]  for  most  of  the  background  information  and  the  data  structure  used.  By 
necessity,  some  of  that  information  will  be  repeated  in  this  paper. 

The  United  States  Marine  Corps  has  about  20,000  officers.  These  can  be  cross 
classified  into  40  military  occupational  specialties  (VIOS),  31  length  of  service  (LOS) 
cells  and  10  grades;  hence  12400  categories  for  manpower  planning  purposes.  Also 
about  half  of  these  categories  are  unoccupied  for  structural  reasons.  These  structural 
zero  categories  will  be  described  in  chapter  III.  The  officer  attrition  and  promotion 
structure  was  described  by  Tucker  [Ref.  1]. 

One  goal  of  this  paper  is  to  examine  whether  the  logistic  regression  model  is  an 
efficient  way  to  estimate  the  attrition  rates  (i.e.  the  rate  of  leaving  the  service,  not  of 
changes  in  MOS,  LOS  or  Grade)  for  the  officer  M OS,' LOS,  Grade  categories.  This 
problem  is  difficult  because  of  the  large  number  of  cells  with  the  low  inventory.  Tucker 
[Ref.  1]  and  Robinson  [Ref.  2]  collected  the  cells  into  major  groups  or  aggregates  to 
treat  this  small  cell  problem;  attempts  were  made  to  aggregate  cells  that  were  believed 
to  have  common  statistical  behavior.  In  the  present  work  we  will  not  collect  the  cells 
into  major  groups.  Every  MOS  will  be  taken  individually.  The  structural  zero  cells  will 
be  dropped  before  applying  the  fitting  procedure.  Namely,  structural  zero  cells  will  not 
be  included  in  the  regression  equations. 

There  are  seven  years  data  available  for  the  present  study.  The  first  four  years 
(from  1977  to  1980)  will  be  used  for  model  development  and  logistic  regression  fitting; 
the  last  three  years  (from  1981  to  1983)  for  validation. 

C.  ORGANIZATION 

Chapter  II  contains  the  details  of  the  methodology  and  notation  used  in  the 
present  work.  A  brief  summary  of  the  generalized  linear  regression  model  is  presented 
in  this  chapter. 

Chapter  III  explains  the  logistic  regression  model  structure  for  the  Marine  Corps 
data  and  the  validation  procedure.  A  numerical  example  will  be  given  to  illustrate  the 
fitting  and  validation  procedures.  Also,  in  this  chapter  we  will  compare  Figures  of  merit 
with  Robinson's  [Ref.  2]  results. 

Chapter  IV  thoroughly  discusses  the  results  and  recommendations. 


Appendix  A  includes  the  APL  functions  for  the  data  manipulation,  the  logistic 
regression  and  the  validation  of  the  model. 

Appendix  B  illustrates  the  logistic  probability  plots  of  residuals  and  the  plots  of 
the  residuals  vs.  fitted  values  for  selected  cases. 
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II.  METHOD  OF  ESTIMATION 


A.  INTRODUCTION 

A  major  use  of  regression  models  is  prediction.  Thus,  given  data  on  a  response 
variable  y  and  associated  predictor  variables  x;  (i  =  1  to  p),  the  aim  of  the  regression  is 
to  find  a  function  of  the  x/s  which  is,  in  some  sense  a  good  predicator  of  y.  It  is 
assumed  throughout  that  the  x/s  at  which  future  predictions  are  required  are  not 
specified  in  advance  but  will  occur  randomly  over  some  population  of  values  and  that 
the  success  of  prediction  can  be  judged  by  its  performance  over  such  a  population. 

Logistic  regression  is  a  member  of  the  class  of  generalized  linear  models.  An 
overview  of  the  linear  model  is  briefly  discussed  in  the  following  section.  All  of  the 
approach  and  background  for  the  logistic  regression  model  was  taken  from  Pregibon's 
[Ref.  3]  paper. 

B.  AN  OVERVIEW  OF  THE  LINEAR  REGRESSION  MODEL 

Linear  regression  is  used  to  relate  a  response  variable  y.  to  one  or  several 
explanatory  or  descriptive  variables  x^  through  a  set  of  linear  equations  of  the  form 

>'i  =  P0  +  jS^ij  +  ei  1  “  1 . n 

The  y.  (for  i  =  1  to  n)  are  the  n  observed  values  of  the  response  variable,  the  x;.  (for  i 
=  1  to  n)  are  the  n  values  of  the  j  th  explanatory  variable  (for  j  =  1  to  p),  and  the 
parameters  P^  are  the  unknown  regression  coefficients.  The  et  are  the  random  "errors" 
or  fluctuations.  The  variables  Xjj  and  y;  are  sometimes  called  "independent"  and 
"dependent"  variables. 

The  linear  equation  above  can  be  simplified  by  defining  an  extra  variable  x-Q 
whose  value  is  always  1  (xj0  =  1),  so  the  model  with  constant  term  can  be  written  as, 

yi  ’  £0pixii  +  Ei  1  -  1 . n 

Usually  the  £.(  are  assumed  to  be  statistically  independent  of  each  other  with  zero 
means  and  with  a  constant  variance  that  does  not  depend  on  i  or  x... 

In  regression  we  usually  want  to  estimate  the  regression  coefficients  from  the 
data,  either  because  we  want  to  know  and  interpret  the  coefficients  themselves,  or 


because  we  will  use  them  to  predict  future  values  of  v..  Upon  replacing  p.  by  their 
estimated  values  p^,  we  obtain  the  fitted  (or  "predicted")  values  y., 

%  ■  £$xii  i  -  1 . n 

The  residuals  8j  are  defined  as  the  differences  between  the  observed  and  the  fitted 
values. 

E;  =  yt  -  Vj  i  =  1 . n 

The  residual  are  used  in  many  diagnostic  displays  because  they  contain  most  of 
the  information  regarding  lack  of  fit  of  the  model  to  the  data.  In  terms  of  fitted  and 
residuals,  we  have 

data  =  fit  +  residual 

which  in  mathematical  notation  is  expressed  as 

>i  =  ^0Pjxij  +  %  *  =  1 . n 

In  matrix  notation  the  least-squares  estimate  P  can  be  found  as  follows, 

<P  -  c2  -  II  y  -  XPII2  =  (y  -  Xj!)T(  y  -  X?) 

where  E  is  the  vector  of  residuals  ,  tr  is  the  square  length  of  residuals  and  y  =  XP  is 
the  vector  of  fitted  values.  When  we  do  some  algebra,  the  equation  becomes 

<p  =  yTy-  2yTXp  +  pTXTXp 

A  A. 

If  we  take  the  derivative  of  <p,  subject  to  P  and  set  the  5(p/5P  equal  to  0,  then  the  least- 

A 

squares  estimate  p  is  obtained  by  solving  this  normal  equation 
XTy  -  XTXP  =  0 
The  solution  of  the  linear  system  is 
P  =  (XTX)_1XTy 

which  is  sensitive  to  poorly  fit  observations  and  extreme  design  points. 

Presently,  there  is  a  fairly  large  battery  of  diagnotics  available  for  detecting  which 

observations  exert  undue  influence  on  p.  The  two  basic  quantities  that  are  most  useful 

*  ** 

for  this  purpose  are  the  residuals,  Ej  =  y-  -  .\jP,  and  the  projection  matrix 


M  =  I  -  H  =  I-X(XTX)“1XT 

where  H  is  called  hat  matrix.  Essentially,  the  vector  c  describes  the  deviation  of  the 
observed  data  from  the  fit,  and  M  the  subspace  in  which  €  lies 

As  a  bottom  line,  the  residual  vector  £  is  important  for  the  detection  of  ill-fitting 
points,  but  will  not  adequately  point  to  observations  which  unduly  influence  the  fit.  In 
particular,  large  residuals  are  seldom  associated  with  high-leverage  points,  whereas 
small  residuals  (which  usually  pass  our  inspection  unnoticed)  are  typically  of  the 
opposite  character. 

C.  BACKGROUND  AND  NOTATION  FOR  THE  LOGISTIC  REGRESSION 

1.  General 

A  maximum  likelihood  fit  of  a  regression  model  is  extremely  sensitive  to 
outlying  responses  and  extreme  points  in  the  design  space. 

Classically,  logistic  regression  models  were  fitted  to  data  obtained  under 
experimental  conditions,  for  example,  bioassay  and  related  dose-response  applications. 
The  current  use  of  logistic  regression  methods  includes  the  analysis  of  data  obtained  in 
observational  studies.  In  contrast  to  controlled  experimentation,  data  from  such 
studies  can  be  notoriously  "bad"  both  from  the  point  of  view  of  outlying  responses  (y), 
and  from  the  point  of  view  of  extreme  points  in  the  design  space  (X).  The  usual 
method  of  fitting  logistic  regression  models,  maximum  likelihood,  has  good  optimality 
properties  in  ideal  settings,  but  is  extremely  sensitive  to  "bad"  data  of  the  above  types. 

In  particular,  good  data  analysis  for  the  logistic  regression  models  need  not  be 
expensive  or  time  consuming. 

2.  Unstructured  case 

Consider  a  single  binomial  response  y  ~  B(n,p).  If  we  let  0  =  logit(p)  = 
log{p  (l  -  p)},  the  probability  function  of  y  can  be  written  as 

fly;  0)  =  exp(y0  -  a(0)  +  b(y)}  y  =  0,1,. ...n 

0 

with  a(0)  =  n  Iog(  1  +  e  ),  b(v)  =  log  (jj)  and  where  throughout  this  paper  log(.)  = 
loge(.).  Up  to  an  arbitrary  constant,  the  logarithm  of  fly;  0), 

1(0;  y)  =  y0  -  a(0)  +  b(y) 


is  the  loglikelihood  function  of  0.  The  score  and  information  functions  are  given  by, 


,a  s  ^(0;  y)  ,  /ft4 
s(0;  y)  -  — ^ —  -  y  -  a  (6)  «  y  -  np 

-ds(0;  v) 

v(0;  y)  -  -gQ  —  ■  3(6)  -  np{i  -  p) 

where  "a"  with  k  dots  above  it  denotes  (0^  /  50^)a(0).  Standard  results  yield  E{s(0;  y)} 
*  np  =  4(0)  and  Var(y)  =  np(l  -  p)  =  8(0).  Also,  since  s(§;  y)  =  0  at  the  maximum 

A  A  M  * 

likelihood  estimate  (m.l.e)  of  0,  we  have  0  =  8  *(y)  =  logit(y,  n)  as  the  m.l.e.  of  0 

based  on  a  single  binomial  observation  y. 

Given  a  sample  of  N  independent  binomial  responses  y.  ~  B(ni,pi).  The 
loglikelihood  function  for  the  sample  is  the  sum  of  individual  loglikelihood 
contributions: 

1(0;  y)  -  INi(0i:  >•)  -  tl  (yft  ■  a(ei)  +  «y.)) 

3.  The  logistic  regression  model 

The  likelihood  function  1(0;  y)  is  over-specified.  There  are  as  many  parameters 

as  observations.  Given  a  set  of  m  explanatory  variables  (XltX2 . Xm),  the  logistic 

regression  model  utilities  the  relationship 

0  =  logit(p)  =  Xp 

as  the  description  of  the  systematic  component  of  the  response  y.  In  terms  of  the  m 
dimensional  paramater  P,  we  have  the  loglikelihood  function, 

1(X;  p)  =  Zj(x-,P;  y-)  *  fjj;  XjP  -  a(X;P)  +  b(y.) 

The  m.l.e.  maximizes  the  above  equation  and  is  a  solution  (assumed  unique)  to 

A 

(d  dp)  1(XP;  y)  =  0.  In  particular,  P  satisfies  the  system  of  equations: 

£%(>’«  •  *(xJ*))  ■  0  j  =  1 . m 

Writing  s  =  y  -  8(XP)  =  y  -  np,  the  formulation  of  the  likelihood  equations  is 
XTs  =  XT(y  -  f)  =  0 
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where  y  ■  np  and  T  denotes  the  transpose.  These  equations,  although  very  similar  to 
their  normal  theory  counterparts,  are  nonlinear  in  fi  and  iterative  methods  are  required 
to  solve  them.  Typically,  when  second  derivatives  are  easy  to  compute  (in  the 
•(d  d0)XTs  ■  XTV  X  with  V  *  diagonal{3(  x^)}),  the  Newton- Raphson  method  is 
employed  .  This  leads  to  the  iterative  scheme 

pt+i  *  pt  +  (XTVX)“lXTs 

where  both  V  and  s  are  evaluated  at  p\  At  convergence  (t  *  u).  we  take  p  -  pu  ,  and 
denote  the  fitted  values  n.  p.  bv  v..  The  estimated  1  ariance  of  v.  is  v..  *  n  n  (  1  -  p  ). 

A  most  useful  way  to  view  the  iterative  process  outlined  above  is  by  the 
method  of  iteratively  reweighted  least-squares  (IRLS).  This  is  obtained  by  employing 
pseudo  observation  vector  z*  ■  XP*  +  V  *s,  for  which  the  above  equation  becomes 

pt+i  ,  (XT  VX)“1XTVzt 

A  ...  - 

At  convergence,  we  have  z  *■  Xp  +  V  *s.  Thus  we  may  write  the  maximum 
likelihood  estimator  of  (}  as 

?  -  (XTVX)-1XTVz 

4.  Output  from  a  maximum  likelihood  fit 

Once  the  model  has  been  fitted  (that  is,  we  have  the  m.l.e.  P>.  various 
quantities  from  the  fitting  process  are  available  for  the  data  analysis.  Typically,  these 
quantities  consist  of  subsets  of  the  following: 

1.  the  estimated  parameter  vector,  P  ; 

A 

2.  the  individual  coefficient  standard  errors,  s.e.(Pj);; 

A  ^  *  ---  —  . 

3.  the  estimated  covariance  matrix  of  p,  var(P)  *  (X  'VX)  1  ; 

4.  the  chi-squared  goodness  of  fit  statistic  x‘  *  V  Sj2  v..  ; 

5.  the  individual  components  of  x^’  namely  Xj  ”  SjVV  “  (\|  -  n,Pj )  v  n,pj(  1  -  p; ); 

6.  the  deviance  D  -  -2fJ(XP  ;  y)  -  1(0  ;  y)},  where  1(8  ;  v)  refers  to  the  maximum 
of  the  loglikelihood  function  based  on  fitting  each  point  exactlv.  i.e..  0  = 

logit(y.;n.f.  1 

Asymptotic  arguments  suggest  that  the  deviance  and  chi-squared  statistics 
have  the  same  limiting  null  x^(N  -  m)  distribution,  and  hence  some  measure  of  the 
appropriateness  of  the  fitted  model. 


D.  THE  BASIC  BUILDING  BLOCKS  OF  REGRESSION  DIAGNOSTICS 


1.  PrtHwi—riH 

After  fitting  a  logistic  regression  model,  and  prior  to  drawing  inferences  from 
it,  the  natural  succeeding  step  is  that  of  critically  assessing  the  fit.  In  practice  however, 
this  assessment  is  rarely  considered  and  seldom  carried  out.  The  basic  reasons  are 

1.  the  lack  of  routine  methods  for  performing  such  an  analysis,  and 

2.  the  presumably  high  cost  of  doing  so. 

The  role  of  a  regression  diagnostician  is  to  provide  routine  methods  of  model 
sensitivity  analysis  which  are  both  intuitively  appealing  and  inexpensive.  Clearly  this 
requires  a  thorough  understanding  of  the  model  and  the  nature  of  the  fitting  process. 

2.  The  bask  building  blocks 

For  the  logistic  regression  model,  the  basic  building  blocks  for  the 
identification  of  outlying  influential  points  wilt  again  be  the  residual  vector  and  a 
projection  matrix.  For  the  linear  model,  residuals  are  rather  uniquely  defined  (apart 
from  standardization),  whereas  for  the  logistic  regression  model,  residuals  can  be 
defined  on  several  (at  least  three)  scales.  The  two  most  useful  are  the  components  of 
chi-square,  given  above  in  (e).  and  the  components  of  deviance.  D  -  £  dj2 

±>/2{l(0i;yi)  -  l(  Xjl  ;  >i)} 1  2, 

A  A  A  A 

where  the  plus  or  minus  is  used  according  as  0j  >  x(P  or  0|  <  xp.  Note  that  dt  is 
defined  for  all  values  of  y{  even  though  0j  may  not  be.  In  particular,  v  -  0,  d2  *  -2n 
log(l-p)  and  at  y  ■  n,  d2  -  -2n  log(p).  Both  X"  and  D  are  the  measures  of  the 
goodness-of-fit  of  the  model. 

The  analog  of  the  projection  matrix  for  the  logistic  model  will  also  be  denoted 
by  M.  which  in  its  general  form  is  given  as 

M  -  I  -  H  -  I  -  V‘  2X(XTVX)~lXTVr2 

The  usefulness  of  M  arises  as  a  consequence  of  the  IRLS  formulation  described  earlier. 
In  particular,  as  P  »  (XTVX)~  *XTVz,  the  vector  of  pseudo-residuals  is  given  by 

i  -  XP  -  {I  -  X(XTVX)  ~  1  XTV)z  -  V  "  1  2M V1  2z 

using  the  fact  that  z  ■  X$  +  V  *s,  this  can  be  written  as  V  _  *s  ■  V  ”  1  2  MV -  1  2s 
Fremultiplication  by  the  diagonal  matrix  V1  2  yields  x  “  Mx,  where  x  *  V-1  ;s 
Thus,  as  in  the  linear  model  case.  M  is  symmetric,  idempotent  and  spans  the  residual 
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(X)  space.  This  suggests  that  small  nij.  which  are  the  diagonal  elements  of  the 
projection  matrix  M  should  be  useful  in  detecting  extreme  points  in  the  design  space. 

In  most  cases,  the  examination  of  xt,  tl  and  m-  will  call  attention  to  outlying 
and  influential  points.  In  some  cases,  combinations  of  these  (for  example,  studentized 
residuals)  will  also  be  useful.  For  displaying  these  quantities,  index  plots  are  generally 
(and,  if  the  order  of  the  observations  is  important  strongly)  suggested:  that  is,  plots  of 
Xj  vs  i,  df  vs  i  and  m.^  vs  i.  In  particular  cases,  plots  of  these  building  blocks  against  the 
fitted  values  could  prove  useful. 


III.  MODEL  BUILDING  WITH  USMC  MANPOWER  DATA 
A.  GENERAL 

Robinson  (Ref.  2]  explains  the  conversion  of  the  raw  data  to  an  A  PL  workspace.  A 
brief  explanation  about  the  conversion  is  given  in  Appendix  A.  The  summary  data  file 
classifies  the  Marine  Corps  officer  inventory  into  40  military  occupational  specialties. 
10  grade  levels,  31  length  of  service  and  8  loss  categories.  In  the  present  study  we  are 
not  dealing  with  the  type  of  loss.  These  were  described  by  Tucker  (Ref  1]  For  use  in 
our  model  we  need  to  define  grades  and  military  occupational  specialties  (MOS)  by 
Table  1  and  2.  When  reference  is  made  to  a  particular  grade  or  group  of  grades  the 
code  number  from  Table  l  used  instead  of  the  name  of  the  grade.  For  example  this 
project  will  refer  to  the  grades  first  lieutenant,  captain  and  major  as  numbers  5,  6  and  7 
respectively.  Tucker  and  Robinson  used  data  code  numbers  for  the  MOS  instead  of 
the  actual  MOS.  For  example,  this  project  will  refer  to  the  Air  traffic  control  MOS  as 
number  37  not  73.  It  should  also  be  understood  that  the  two  digit  MOS  identifier  listed 
in  Table  2  is  strictly  the  military  occupational  specialty  identifier  in  the  L  SMC  MOS 
manual.  We  will  also  use  the  code  number  from  Table  2  for  the  MOS.  The  column 
containing  the  letters  A  through  E,  refer  to  the  structural  zero  categories. 

TABLE  I 
GRADES 

GRADE 

WARRANT  OFFICER  (W-l) 

CHIEF  WARRANT  OFFICER  (CWO-2 
CHIEF  WARRANT  OFFICER  (CWO-3 
CHIEF  WARRANT  OFFICER  (CWO-4 
SECOND  LIEUTENANT 
FIRST  LIEUTENANT 
CAPTAIN 
MAJOR 

LIEUTENANT  COLONEL 
COLONEL 


TABLE  2 

MILITARY  OCCUPATIONAL  SPECIALTIES  (MOS) 


DATA 

CODE 


MOS  CAT 

UN  A 

01  A 

02  A 

03  C 

04  A 

08  A 

11  D 

13  A 

14  D 


10 

la 

C 

11 

21 

A 

12 

23 

B 

13 

25 

A 

14 

26 

A 

15 

28 

B 

16 

30 

A 

17 

31 

A 

18 

33 

A 

19 

34 

A 

35 

A 

h 

40 

A 

B 

A 

A 

A 

B 

D 


MOS  TITLE 
UNKNOWN 

PERSONNEL  AND  ADMINISTRATION 

INTELLIGENCE 

INFANTARY 

LOGISTICS 

FIELD  ARTILLERY 

UTILITIES 

ENGINEER,  CONSTRUCTION  AND  EQUIPMENT 
DRAFTING,  SURVEYING  AND  MAPlNG 
PRINTING  AND  REPRODUCTION 
TANK  AND  AMPHIBIAN  TRACTOR 
ORDNANCE 

AMMUNITION  AND  EXPLOSIVE  ORDNANCE 
DISPOSAL 

OPERATIONAL  COMMUNICATIONS 

SIGNALS  INTELLIGENCE/GROUND  ELECTRONIC 

WARFARE 

DATA/COMMUNICATIONS  MAINTENANCE 
SUPPLY  ADMINISTRATION  AND  OPERATIONS 
TRANSPORTATION 
FOOD  SERVICE 

AUDITING. FINANCE  AND  ACCOUNTING 

MOTOR  TRANSPORT 

DATA  SYSTEMS 

MARINE  CORPS  EXCHANGE 

PUBLIC  AFFAIRS 

LEGAL  SERVICES 

TRAINING  AND  AUDIOVISUAL  SUPPORT 
BAND 


28 

58 

A 

MILITARY  POLICE  AND  CORRECTIONS 

29 

59 

B 

ELECTRONICS  MAINTENANCE 

30 

60 

A 

60  XX 

31 

61 

A 

AIRCRAFT  MAINTENANCE 

32 

63 

B 

AVIONICS 

33 

65 

B 

AVIATION  ORDNANCE 

34 

68 

B 

WEATHER  SERVICE 

35 

70 

D 

AIRFIELD  SERVICES 

36 

72 

A 

AIR  CONTROL,  AIR  SUPPORT  AND  ANTI-AIR 
WARFARE 

37 

73 

A 

AIR  TRAFFIC  CONTROL 

38 

75 

C 

PILOTS  AND  NAVAL  FLIGHT  OFFICERS 

39 

99 

E 

IDENTIFYING  MOS  AND  REPORTING  MOS 

A  structural  zero  is  a  cell  whose  inventory  is  alwavs  zero  because  certain  grades 
and  length  of  service  combinations  should  never  appear  in  that  nulitarv  occupational 
specialty  (MOSl.  For  example  a  Colonel  with  *  \ears  of  service  in  any  MOS  or  an 
inventors  warrant  officer  in  MOS  03  does  not  exist  I  he  effect  of  these  structural  zero 
categories  is  summarized  in  Table  3 


TABLE  3 

STRUCTURAL  ZEROES  CATEGORIES 

Stru. 

Totol 

Gradas 

Number 

Zeroes 

Zeroes 

Category 

within  MOS 

of  MOS 

per  MOS 

per  Cat. 

A 

WOl.  .  . LTCOL 

23 

129 

2967 

B 

WOl.  .  .  CW04.LD0  8 

159 

1272 

C 

2LT. . . LTCOL 

3 

202 

606 

D 

WOl.  .  .  CW04 

5 

237 

1185 

E 

WOl.  .  .  COL 

1 

119 

119 

TOTAL 

40 

6149 

B.  HOW  TO  BUILD  THE  LOGISTIC  REGRESSION  MODEL  WITH  USMC 

DATA 

1.  Introduction 

The  purpose  of  this  study  is  to  develop  the  logistic  regression  model  for 
estimating  USMC  officer  attrition  rates  using  length  of  service  (LOS)  and  grade  (GR) 
as  carrier  variables.  The  logistic  regression  model  for  the  estimation  of  USMC  officer 
attrition  rates  can  be  formulated 

0  -  logit(p)  -  p,  +  P2(LOS)  +  P3(GR) 

In  matrix  notation,  this  can  be  written  as 

e  -  xp 

where  X  is  Nxm  matrix,  also  called  the  design  space  and  p  is  the  mxl  matrix,  also 
called  the  coefficients  of  the  regression.  Then,  it  can  be  said  that  0  •  logit(p)  is  a  Nxl 
matrix. 

2.  How  to  create  the  design  space 

Each  MOS  is  taken  individuals'  for  the  estimation  of  officer  attrition  rates. 
Every  MOS  has  dimension  31x10  for  31  LOS  s  and  10  grades.  Each  LOS  and  grade 
must  he  broken  into  segments  and  each  segment  is  a  seperate  regression.  As  an 
example,  any  MOS  can  be  broken  into  four  segments  as  in  Table  4.  Each  segment  has 
its  own  X  matrix.  Each  design  space  (X)  has  dimension  Nxm  where  N  stand  lor  the 
number  of  independent  binomial  responses  and  m  stand  for  the  number  of  explantorv 
variables,  which  is  alwavs  three  in  our  case.  I  his  X  matrix  can  be  written 


CNT  LOS  GR 
X1I  X12  X13 

X21  X22  X23 

X(Nx3)  " 


I  xNl  xN2  xN3  I 

where  CNT  means  constant  which  is  the  first  column  of  the  X  and  always  one. 


C.  A  NUMERICAL  EXAMPLE  FROM  THE  USMC  DATA 

As  an  illustration  of  the  standard  output  from  a  maximum  likelihood  fit  and  the 
use  of  the  logistic  regression  model,  we  will  use  the  case  where  military  occupational 
specialty  (MOS)  •  20  (motor  transport,  from  Table  2),  length  of  service  (LOS)  * 
from  5  to  19  years  and  grades  *  4, 5, 6, 7  (second  lieutenant,  first  lieutenant,  captain 
and  major,  from  Table  1).  The  data  are  listed  in  Table  5.  They  are  obtained  using  the 
APL  data  manipulation  functions  described  in  detail  in  Appendix  A. 

In  Table  5,  the  structural  zero  inventory  cells  are  dropped  before  applying  the 
fitting  procedure.  The  output  listed  in  Table  6.  is  obtained  using  the  APL  logistic 
regression  functions  in  Appendix  A.  We  get  the  estimated  coefficients  of  regression  as 
follows, 

P,  -  0.548539 
P2  -  -0.17092 
P3  -  -0.20117 

21 


ywyaV. 


I  M 


TABLE  5 

X 

DATA 

CENTRAL 

LOSS 

INVENTORY 

CNT 

LOS 

GR 

Vi 

1 

5 

4 

0 

4.  5 

1 

5 

5 

6 

33.  5 

1 

5 

6 

1 

3 

1 

6 

4 

0 

2.  5 

1 

6 

5 

5 

19 

1 

6 

6 

0 

9 

1 

7 

4 

0 

1.  5 

1 

7 

5 

1 

5.  5 

1 

7 

6 

2 

13 

1 

8 

5 

3 

3 

1 

8 

6 

1 

14 

1 

9 

4 

0 

1 

1 

9 

5 

3 

4.  5 

1 

9 

6 

2 

12.  5 

1 

10 

4 

0 

1 

1 

10 

5 

0 

3.  5 

1 

10 

6 

0 

12 

1 

10 

7 

0 

0.  5 

1 

11 

4 

0 

0.  5 

1 

11 

5 

1 

7 

1 

11 

6 

0 

5 

1 

11 

7 

0 

3 

1 

12 

5 

0 

7 

1 

12 

6 

1 

5 

1 

12 

7 

0 

4 

1 

13 

5 

0 

10.  5 

1 

13 

6 

0 

4.  5 

1 

13 

7 

0 

3 

1 

14 

5 

0 

10 

1 

14 

6 

1 

7 

1 

14 

7 

0 

4 

The  deviance  for  the  fit,  46.5863  on  28  degrees  of  freedom,  and  the  corresponding 
chi-squared  statistic  is  46.4579.  Both  are  less  than  their  asymptotic  expectation  of  28, 
indicating  no  gross  inadequacies  with  the  model.  In  table  6,  Xj  is  the  individual 
component  of  x2.  d(  is  the  component  of  deviance  and  m.(i  is  the  diagonal  element  of  of 
projection  matrix  M.  The  examination  of  X;.  dj  and  m-  calls  attention  to  outlying  and 
influental  points.  The  individual  components  of  x2  and  of  the  deviance  (d)  are  plotted 
against  the  logistic  probability  plot  in  Figure  3.1.  Evidently,  two  observations,  the  10th 
and  13th  are  not  well  fit  by  the  model;  their  xs  and  deviance  (residuals)  deviate  from  the 
straight  line  configuration  of  the  others.  Also,  fitted  values  are  plotted  against  the 
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TABLE  6 

OUTPUT 

logit^/i^)  \ 

d, 

"ii 

1 

— 

-1.  1108 

-1.  2172 

1.  6005 

0.  8124 

2 

-1.  5224 

-1.  3120 

-0. 4678 

-0. 6487 

0.  5298 

3 

-0. 6931 

-1.  5131 

0. 6884 

1.  2806 

0.  9213 

4 

- 

-1.  2817 

-0.  8329 

1.  1066 

0.  9054 

5 

-1.  0296 

-1.  4829 

0. 8775 

0.  9521 

0.  8210 

6 

- 

-1.  6841 

-1.  2924 

1.  7506 

0.  8388 

7 

- 

-1.  4526 

-0. 5923 

0.  7941 

0.  9459 

8 

-1.  5040 

-1.  6538 

0.  1356 

0. 5473 

0:  9595 

9 

-1. 7047 

-1.  8540 

0. 1957 

0. 5482 

0.  8368 

10 

-1.  8247 

4.  3133 

3. 4417 

0.  9784 

11 

-2. 5649 

-2. 0259 

-0. 5256 

-1.  0382 

0.  8666 

12 

- 

-1.  7945 

-0.  4076 

0. 5545 

0.  9633 

13 

-0.  6931 

-1.  9957 

3. 5753 

2. 3189 

0.  9625 

14 

-1. 6582 

-2. 1969 

0.  7066 

1.  0379 

0.  8973 

15 

- 

-1.  9654 

-0.  3742 

0.  5120 

0.  9620 

16 

- 

-2. 1666 

-0. 6332 

0. 8713 

0.  9640 

17 

- 

-2. 3678 

-1. 0602 

1. 4660 

0.  9027 

18 

• 

-2. 5690 

-0. 1957 

0.  2716 

0.  9891 

19 

- 

-2. 1364 

-0. 2429 

0.  3340 

0.  9802 

20 

-1.  7917 

-2. 3375 

0. 5116 

1.  0448 

0.  9114 

21 

• 

-2. 5387 

-0. 6283 

0. 8717 

0.  9561 

22 

- 

-2. 7399 

-0. 4401 

0. 6127 

0.  9429 

23 

- 

-2. 5085  < 

-0. 7548 

1. 0466 

0.  8942 

24 

-1.  3862 

-2. 7096 

1.  2719 

1.  6268 

0.  9507 

25 

<- 

-2. 9108 

-0. 4665 

0.  6511 

0.  9309 

26 

- 

-2. 6794 

-0. 8487 

1.  1804 

0.  8171 

27 

- 

-2. 8806 

-0. 5024 

0. 7008 

0. 9499 

28 

• 

-3. 0817 

-0. 3709 

0. 5187 

0.  9515 

29 

- 

-2. 8503 

-0. 7604 

1.  0603 

0.  8054 

30 

-1.  7917 

-3. 0515 

1.  2450 

1.  5873 

0.  9133 

31 

-3. 2527 

-0. 3932 

0. 5509 

0.  9381 

components  of  the 

deviance  and 

the  components 

of  the  x2  in 

Figure  3.2.  For 

displaying  the  combinations  of  Xj*  d.  and  m^,  index  plots  (i.e.  x;  vs  i,  d  vs  i  and  m-  vs 
i)  are  showed  in  Figure  3.3. 

Also,  we  selected  some  cases  to  examine  whether  the  coefficients  of  regression 
have  temporal  stability  or  not.  The  estimated  coellicients  of  regression  are  listed  by 
Table  7  for  the  selected  cases. 


23 


TABLE  7 

COEFFICIENTS  OF  REGRESSION  FOR  SOME  CASES 

MOS  =  3  (INFANTRY) 

Pi 

P2 

P3 

OS  LOS  S  6  AND  4SGRS6 

-5. 786 

0.  037 

0.  764 

3  £  LOSS  9  AND  4SGRS6 

-2. 029 

-0.  212 

0.  245 

9  S  LOSS  19  AND  5SGRS8 

4.  714 

0.  047 

-1.  389 

19  S  LOSS  29  AND  7SGRS9 

-1.  376 

0.  191 

-0. 609 

MOS  =  7  ( ENGINEER, 

CONSTRUCTION 

AND  EQUIPMENT) 

Pi 

P2 

P3 

OS  LOSS  6  AND  4SGRS6 

-5.  900 

0.  037 

0.  827 

3  S  LOSS  9  AND  4SGRS6 

-1.  758 

-0.  129 

0.  129 

9  S  LOSS  19  AND  5SGRS8 

3.  846 

-0. 160 

-0.  845 

19  S  LOS  S  29  AND  7SGRS9 

0.  021 

0.  150 

-0. 639 

MOS  =  13  (OPERATIONAL  COMMUNICATION) 

Pi 

P2 

P3 

OSLOSS  6  AND  4SGRS6 

-5. 995 

0.  038 

0.  884 

3  S  LOSS  9  AND  4SGRS6 

-1. 188 

-0.  186 

0.  281 

9  S  LOSS  19  AND  5SGRS8 

3.  366 

-0.  117 

-0.  776 

19  S  LOSS  29  AND  7SGRS9 

-0. 783 

0.  178 

-0.  614. 

MOS  =20  (MOTOR  TRANSPORT) 

Pi 

P2 

P3 

OSLOSS  6  AND  4SGRS6 

-7.  406 

-0.  089 

1.  249 

3  S  LOSS  9  AND  4SGRS6 

-4.  438 

-0.  066 

0.  646 

9  S  LOSS  19  AND  5SGRS8 

1.  866 

-0.  315 

-0.  135 

19  S LOSS 29  AND  7SGRS9 

-0.  440 

0.  009 

-0.  101 

MOS  =  38  (PILOTS  AND  NAVAL  FLIGHT  OFFICERS) 

Pi 

P2 

P3 

OSLOSS  6  AND  4SGRS6 

-10. 1922 

-0.  0404 

1.  5493 

3  S  LOSS  9  AND  4SGRS6 

-10. 8841 

-0. 1476 

1. 7560 

9  S  LOSS  19  AND  5SGRS8 

2.  1225 

-0. 1663 

-0.  4317 

19 S  LOSS 29  AND  7SGRS9 

0.  3936 

0.  2257 

-0.  8984 
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D.  VALIDATION  OF  MODEL 

A  validation  test  was  conducted  to  evaluate  the  efficiency  of  the  logistic 
regression  model  for  the  estimation  of  the  L'SMC  officer  attrition  rates.  The  test  was 
conducted  as  follows: 

1.  Select  the  LOS's  and  grades  within  a  military  occupational  specialty.  The 
resulting  desired  array  will  be  three  dimensional  (years,  L.OS,  grades) 

2.  Let  "i"  stand  for  LOS,  then  i  =  0,...,30 

3.  Let  "j“  stand  for  GR,  then  j  =  0,...,9 

4.  Let  v..  =  number  of  leavers  in  cell  (i,i) 

5.  Let  n-  =  central  inventory  in  (i,j)  *  max  {(N(t)+N(t  +  l))/2,  Y(t )} 

6.  Let  t  =  1.....T  where  T  =  number  of  years  (i.e  from  1977  to  1983)  of  data  used 
to  create  the  estimator 

The  validation  procedure  used  t  =  1,...,4  (i.e.  from  1977  to  1980)  for  the  fitting 
and  t  =  5,6,7  (i.e.  from  1981  to  1983)  for  validation. 

The  following  procedures  were  utilized  to  validate  the  effectiveness  of  the  logistic 
regression  estimation  process.  We  define  an  indicator  variable 

1  pi}  =  0  or  1 

D,r  ir 


0  p;j  *  0  or  1 


Then 


K  -  II  D,j 


for  all  i  and  j 


where  K  is  the  number  of  nonstructural  zeroes  cells.  Then  validation  test  can  be 
formulated  as  chi-square  goodness  of  statistic  test  as  follows 


Chi-square  MOE  =  ££  Djj 


(Pij  -  Pi))2 


Pijd-iV 


for  all  i  and  j 


Where  p^  is  found  from  the  fitting  using  the  estimator  years,  p-  (=  y/n)  can  be 
obtained  from  the  validation  and  the  central  inventory  which  comes  from  the 
validation  years.  For  our  numerical  example,  (MOS  =  3,  LOS  =  5  through  14  and 
GR  =  4, 5, 6, 7)  we  get  the  following  validation  test  results  for  the  years  1981,  1982  and 
1983  specifically  MOE;  are  52.6998,  36.4182  and  30.6585  respectively. 


SCATTER  PLOT  OF  FITTED  VALUES  VS  RESIDUALS 


3HVnOS-IHO  JO  SJ.N3N0dlN03 


C  Z  l  0 
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Figure  3.2  Plots  of  fitted  values  vs  Xj  and  d. 
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INDEX  PLOTS  OF  BASIC 


Figure  3.3  Index  plots  of  Xj,  dj  and  my. 
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E.  COMPARISON  OF  THE  FIGURES  OF  MERIT 


In  this  section,  we  will  compare  the  figures  of  merit  with  Major  Robinson  s 
{Ref.  2]  results.  As  we  mentioned  before,  he  used  the  limited  translation  shrinkage 
estimation  (LTSE)  for  the  estimation  of  USMC  officer  attrition  rates.  We  have  been 
using  a  different  estimation  method  for  the  same  manpower  data.  Also,  he  used 
procedure  which  we  explained  in  the  above  section  to  validate  the  effectiveness  of  the 
limited  translation  shrinkage  estimaton.  In  order  to  compare  the  figures  of  merit  of 
logistic  regression  and  the  shrinkage  estimation,  we  present  some  results  for  some  cases 
in  Tables  8  and  9. 

If  we  look  at  the  tables  we  can  see  that  shrinkage  estimation  looks  better  than 
logistic  regression  estimation  for  most  of  the  selected  cases.  We  can  t  say  that  limited 
translation  shrinkage  estimation  is  much  better  than  logistic  regression.  The  results  are 
very  close  to  each  other  for  some  cases,  even  though,  logistic  regression  is  sometimes 
better  than  shrinkage  estimation  (i.e.  for  case  MOS  =  20,  3^  LOS  £9  and 
4<SGR£6). 


TABLE  8 

FIGURES  OF  MERIT 

(0£  LOS  £  6 )  AND  (4SGRS6) 

MOS  *  3  ( INFANTRY) 

1981 

1982 

1983 

LTSE 

REGRESSION 

27. 8528 

59. 8577 

42.  4799 

88. 5361 

45.  9140 
86.  6193 

MOS  »  7  (ENGINEER, 

CONSTRUCTION  AND 

EQUIPMENT) 

LTSE 

REGRESSION 

13. 2892 

35.  3195 

18. 8664 

31.  3636 

20. 7735 
27. 6810 

MOS  »  13  (OPERATIONAL  COMMUNICATIONS) 

LTSE 

REGRESSION 

22. 4989 

41.  7272 

16. 1496 

31.  5084 

13. 5038 
30. 6847 

MOS  =  20  (MOTOR  TRANSPORT) 

LTSE 

REGRESSION 

15.  9591 

24.  4329 

34.  4740 

28.  3449 

17. 8570 
22. 5246 

(3SLOSS9)  AND  (4SGRS6) 

MOS  =  3  ( INFANTRY) 

1981 

1982 

1983 

LTSE 

REGRESSION 

19.  1602 

73. 0644 

67. 2562 

89. 0204 

34.  1118 
61.  9981 

MOS  =  7  (ENGINEER, 

CONSTRUCTION  AND 

EQUIPMENT) 

LTSE 

REGRESSION 

20.  5515 

60. 5127 

19. 8988 

40. 1607 

18. 2333 
26.  2687 

MOS  *  13  (OPERATIONAL  COMMUNICATIONS) 

LTSE 

REGRESSION 

20.  3665 

28.  6348 

15.  3913 

25. 9982 

17. 6670 
32. 2280 

MOS  =  20  (MOTOR  TRANSPORT) 

LTSE 

REGRESSION 

22. 3545 

26.  1725 

52. 2840 

31. 6402 

35. 5580 
19. 7830 
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TABLE  8 

FIGURES  OF  MERIT  (CO\T  D  > 


(  9  £  LOS  £  19  )  AND  (5SGRS8) 


MOS  *  3  ( INFANTRY) 

1981 

1982 

1983  j 

LTSE 

REGRESSION 

84. 5388 
149. 5783 

70. 3422 

61. 7802 

40. 2220 
41.9882  | 

MOS  »  7  (ENGINEER, 

CONSTRUCTION 

AND 

EQUIPMENT) 

! 

LTSE 

REGRESSION 

42. 4237 
84.  4140 

22. 9296 

48. 6112 

17. 3584 
24.7120 

MOS  =  13  (OPERATIONAL  COMMUNICATIONS) 

i 

LTSE 

REGRESSION 

48. 3150 
108.  1312 

25.  9520 

41. 2197 

26.  6658 
37.  5635  j 

MOS  «  20  (MOTOR  TRANSPORT) 

l 

i 

j 

LTSE 

REGRESSION 

20.  5629 
41. 8773 

24.  6164 

44. 0796 

16. 2029 
33.7604 

(  19  £  LOS  £  29  )  AND 

(  7  £  GR  JS  9 ) 

j 

MOS  «  3  (  INFANTRY) 

1981 

1982 

1983 

j 

LTSE 

REGRESSION 

30. 0620 
46. 3861 

18. 9604 

28. 9819 

29. 1716  i 
32.3470  , 

MOS  a  7  (ENGINEER, 

CONSTRUCTION 

AND 

EQUIPMENT) 

! 

LTSE 

REGRESSION 

21.  8423 
28.  3865 

25.  2194 

33. 0140 

34.9758  i 
35.8610  ; 

MOS  a  13  (OPERATIONAL  COMMUNICATIONS) 

i 

i 

I 

LTSE 

REGRESSION 

46.  9617 
77. 5956 

20. 6439 

36.  2923 

10. 8807  1 
21.  5748 

MOS  *  20  (MOTOR  TRANSPORT) 

: 

LTSE 

REGRESSION 

12. 5150 
23. 2035 

15. 5716 

27. 9930 

12.9169 
31.  8230 

IV.  CONCLUSIONS  AND  RECOMMENDATIONS 


A.  CONCLUSIONS 

Recall  that  the  logistic  function  and  its  inverse  can  be  expressed  as 

0  6 

0  ■  In  (p  (1-p)}  and  p  *  e  :  (I  +  e  ) 

Further,  it  is  useful  to  record  , 

0  0 

dp  d0  -  e  (1  +  e  )2 

Identifying  p  as  the  attrition  rate,  we  can  use  a  limited  Taylor  approximate  the  change 
in  rates.  Thus, 


Ap  -  p<l  -p){p2ALOS  +  p3AGR) 


provides  us  with  a  linear  approximation  to  the  direction  and  amount  of  change. 

Although  the  logistic  regression  approach  does  not  improve  upon  the  attrition 
rate  estimators  developed  by  Tucker  [Ref.  1]  and  Robinson  [Ref.  2]  it  does  point  to  the 
direction  of  change  as  one  varies  LOS  and  GR.  To  this  end,  it  was  necessary  to 
partition  the  30  year  LOS  range  into  segments.  It  is  an  exercise  in  curiosity  to 
speculate  as  to  the  reasons  for  observed  behavior  in  these  segments.  Here  is  our 
ofTering 

1.  0  S  LOS  £  5;  attrition  rates  are  chaotic  as  young  officers  "test  the  waters". 

2.  3  £  LOS  i  9;  attrition  rates  decline  with  increasing  LOS  as  officers  commit 
themselves  to  longer  second  and  third  contracts.  One  would  think  that 
advancement  in  grade  would  also  correlate  with  a  lower  rate,  but  we  don  t  see 
that  ip  Table  8  also  there  are  other  kinds  of  shifts  influencing  the  attrition 
behavior  in  these  years. 

3.  9  £  LOS  £  19;  the  maturing  carrier  commitment  has  been  made  and  rates 
decline  with  increasing  LOS  and  GR. 

4.  19  i  LO.S  £  30  ;  since  advancement  opportunities  of  the  senior  officer  are 
ciu^te  limited  we  see  rates  increasing  with  LOS  and  decreasing  with  advances  in 

B.  RECOMMENDATIONS 

The  linear  approximation  to  the  effect  of  change  could  be  most  useful  if  we  could 
group  the  MOS  categories  into  sets  of  common  regression  coefficients  and  if  these 
coefficients  were  -*-’ble  over  time.  To  pursue  each  of  these  contingencies  requires 


additional  work  and  an  expanded  data  base.  The  programs  developed  in  this  thesis 
serve  as  a  foundation  for  extension. 


vt 
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APPENDIX  A 
APL  FUNCTIONS 


1.  GENERAL 

This  appendix  contains  APL  functions  for  the  data  manipulation,  logistic 
regression  and  the  validation  of  the  model  The  original  data  is  on  a  magnetic  tape 
named  COl\TS  prepared  by  Navy  Personel  Resarch  and  Development  Center 
(NPRDC).  Robinson  |Ref  2|  explained  the  conversion  of  raw  data  from  tape  to  an 
APL  workspace.  In  order  to  get  the  LOSSXX  (Losses)  and  INVXX  (Inventories) 
arrays,  the  procedure  should  be  followed  in  the  order  presented  by  Robinson  XX  is 
the  applicable  fiscal  year  (e  g  77  for  fiscal  sear  l*C7) 

2.  DATA  MANIPULATION  FUNCTIONS 

Some  APL  functions  were  developed  by  lucker  and  Robinson  for  the  data 
manipulation  and  exucution  of  calculations  pertaining  to  the  processes  under 
evaluation.  These  functions  will  be  summarized  in  the  following  section  We  will  use 
some  of  them  in  this  project  They  are  GLTINV.  INVMATX,  GLTLOSS  and 
MATRIX  Also,  two  more  APL  functions  were  utilized  for  the  manipulation  of  the 
data  in  order  to  use  the  logistic  regression  and  validation 
a  Creating  the  inventors  and  loss  arrays 

Using  the  IWXX  arrays  and  the  APL  function  GETINV  in  I  igure  A  1  and 
INVMATX  in  Figure  A  2  create  the  array  IXX.  Note  that  GETINV  calls  INVMATX 
and  INV'MAIX  uses  the  INVXX  arrays  APL  workspace  size  limitations  may  be  a 
problem  due  to  the  large  amount  of  data  It  may  be  necessary  to  create  one  or  two 
arrays  at  one  time  and  copy  them  to  another  workspace 

The  LXX  arrays  are  created  in  a  manner  similar  to  the  above,  using  the  APL 
functions  GETLOSS  in  Figure  A. 3  and  MATRIX  in  Figure  A. 4  APL  function 
MATRIX  uses  the  loss  arrays  LOSSXX  The  resulting  matrices  are  IXX  and  I  XX 
for  fiscal  year  XX  .  The  function  INVVM  X  and  MA  I  RIX  could  create  a  matrix 
of  the  following  dimension  7x40\l0x3l  for  ^  years,  40  MOS  s.  10  grades  and  31  LOS  s. 
However,  due  to  limited  workspace,  the  dimension  of  40x31x10  for  4(>  MOS  s  31  l  OS  s 
and  l<»  Grades  was  commonly  utilized 


Figure  A. 2  API.  Function  IWMATX. 


b  Manipulation  of  the  data  for  regression  and  validation 

The  function  GETCENINV  in  Figure  A. 5  creates  the  central  inventors  which 
assigned  CIXX  for  the  fiscal  years  from  IV’"7  to  1^83.  The  function  GETCENINV  uses 
the  global  sanables  of  IXX  and  l.XX  for  the  inventors  and  loss  matrices 
respectisely.  tor  fiscal  sear  XX 


is 


i 


* 

1 


7  GET LOSS 

THIS  FUNCTION  CALLS  MATRIX  FOR  EACH 
YEAR.  LXX  IS  THE  LOSS  ARRAY  FOR  FISC < 
XX  BY  OF nOS /GRADE. 

Y7+MATRIX  LOSS!  7 


A  THIS  FUNCTION  CALLS 
ft  YEAR.  LXX  IS  THE  LOS 
ft  XX  BY  OF /LOS/ GRADE. 
L77+HATRIX  LOSSY  7 
L7B+HATRIX  LOSSY  8 
L7Z+MATRIX  LOSSY  9 
LSO+MATRIX  LOSS 80 
ft  L81+HATRIX  LOSS 81 
ft  LB2-*-MATRIX  LOSS 82 
ft  £83*J(Arfi.rX£d5S83 
7 


EACH  FISCAL 
FISCAL  YEAR 


Figure  A. 3  APL  Function  GETLOSS. 


7  Z+HATRIX  X :  A  s B  iC :DiEiF  1 1 iJ 
ft  THIS  FUNCTION  CREATES  THE  LOSS  ARRAY  FOR  THE  FISCAL 
ft  YEARS  USING  THE  ARRAY  OF  LOSS  INDICES  LOSSXX .  IT  IS 
ft  CALLED  BY  GETLOSS .  LOSSXX  MUST  BE  A  CHARACTER  VECTOR 
o  WITH  9  DATA  ENTRIES  FOLLOWED  BY  1  BLANK  FOR  EACH  LOOP. 
24.(40  31  10)p0 
I<-pX 

J«.(  I+l)*10x 
LOOP  :  ♦  w  =  0  )  /OUT 
A+ftd+X) 

fl4-i+r*f2+X4.(i+x))) 

Ol  +  CftCl+X+d+X))) 

04.1  +  (»(2+X4.(l4-X))) 

S«-ft(l+X«-(2+X)) 

F+»(  2  +  X4-(l+X$) 

^?s?iS>ZCB'iiC3tF 

J+J-l 

+L00P 

OUT :  '  FINISHED  -  -  SHAPE  OF  MATRIX  IS ' 


Figure  A. 4  APL  Function  MATRIX. 

The  function  GETDATA  in  Figure  A. 6  manipulates  the  data  for  regression 
and  validation  procudures.  The  outputs;  IEST  and  LEST  are  the  sum  of  CIXX  and 
LXX  respectively  where  "XX"  is  the  fiscal  years  1977  to  1980.  i.e.  the  first  4  years  are 
used  for  the  estimation.  "IVALXX"  and  "LVALXX"  are  the  CIXX  and  LXX 
respectively  where  "XX"  here  is  the  fiscal  years  from  1981  to  1983,  i.e.  the  last  three 


V  GETCENINV 

A  GET  THE  CENNTRAL  INVENTORY  DATA  FOR 
a  THE  FISCAL  YEAR  FROM  1977  TO  1983 
CI77+(Cl77+I78  )*2UL77 
CI78+UI78+I79U2HL78 
CI79+1 (J79+I805+2; '£79 
CI80+( (180+181 )*2 ) 'L80 
CI81* 

Cl  82* 


181+18 
[  82+T83 


CI82+I8Z\ L82 
V 


UL7  8 
[£79 
)  [£80 
)r£8i 
’£82 


Figure  A.5  APL  Function  GETCENINV. 


V  GETDATA 

11  a  MANIPULATE  THE  DATA  TO  USE  IN  RECGRESSION 
2  A  AND  VALIDATION  PROCUDURES 

3,  IEST-*-CI7  7  +CI7  8  +CI7  9  +CJ80 

4,  LEST+L77+L78+L7Q+L80 

5,  IVAL81+CI81 

6  I7>l£82«-Cr82 

7  IVAL8Z+CI8Z 

8,  LVAL81-*-L81 

9]  £7A£82+£82 

10]  LV AL82+L83 

V 


Figure  A. 6  APL  Function  GETDATA. 

years  are  used  for  the  validation  procudure.  The  function  GETDATA  uses  the  global 
variables  CIXX  and  LXX  for  the  central  inventory  matrix  and  loss  matrix  for  fiscal 
year  "XX”. 

c.  Why  the  central  inventory? 

A  problem  arises  on  several  occasions  when  the  data  is  disaggregated  to  a 
level  for  which  the  inventory  is  very  small.  For  example,  when  examining  the  inventory 
in  a  particular  fiscal  year,  the  inventory  can  be  zero  for  a  length  of  service  (LOS)  and 
military  occupational  specialty  (MOS)  combination.  Examining  the  inventory  in  the 
next  fiscal  year  for  the  same  LOS  and  MOS  combination  may  also  be  zero.  The 
problem  arises  when  the  number  of  leavers  is  equal  to  or  greater  than  one. 


V  LOGISTIC 

A  THIS  IS  THE  MAIN  FUNCTION  FOR  THE  REGRESSION  DIAGNOSTIC 
a  AND  THE  VALIDATION.  THIS  FUNCTION  CALLS  THE  FUNCTIONS 
A  FITTED.  RESIDUAL  AND  VALIDATION  WHICH  THEY  ALL  MUST  BE 
A  IN  THE  SAME  APL  WORKSPACE . 

FITTED 
RESIDUAL 
VALIDATION 
□PP<-  8 

'  WOULD  YOU  LIKE  TO  SEE  RES ,  FITTED  VALUES  AND  BET  AH  AT ' 

J  '0  :NO  1  '.YES' 


11]  »Q 


*(KK= 0)/ 
'  BET AH AT 


.14]  BETA 


£14 
IS  ' 


VECTOR  OF  FITTED  VALUES 1 


16]  TETHAT 


18]  DEV 


VECTOR  OF  COMPONENTS  OF  DEVIANCE  IS ' 


19  '  VECTOR  OF  COMPONENTS  OF  CHI-SQUARE  IS ' 

20  CHI COM 

21  '  TOTAL  DEVIANCE  IS  '  ,  *D 

22.  •  CHI-SQUARE  TEST  STATISTIC  IS  '  ,  *CHI 

23.  L 14:'  WOULD  YOU  LIKE  TO  SEE  THE  VALIDATION  RESULTS ' 

24.  '0  '.NO  1  '.YES' 

2  5  MM*-  □ 

26,  -*-(MM=0)/£15 

27,  '  CHI-SQUARE  MOE  FOR  THE  VALIDATION ' 

28,  »  1981  1982  1983' 

29  CHISQ 

30,  ' DEGREES  OF  FREEDOM  IS  ' , 9DEF 

31,  '  ' 

32,  £15:'  WOULD  YOU  LIKE  TO  RUN  FOR  ANOTHER  CASE ' 

33  '0  '.NO  1  '.YES' 

34,  TT+U 
35  -*-(ZT=0  )/0 

36]  LOGISTIC 
V 


Figure  A. 7  Apl  Function  LOGISTIC. 

This  can  occurs  because  the  inventory  figures  refer  to  the  instant  beginning  of 
the  fiscal  year,  and  the  loss  figures  refer  to  any  time  during  the  year.  I.e.  an  officer  can 
both  access  and  attrite  from  it  any  time  during  the  year.  Then  p  ( =  y  n)  would  be 
ambigous  where  y  is  the  leavers  and  n  is  the  inventory  at  time  t. 

For  the  purpose  of  removing  this  ambiguity  from  the  data,  the  following 
policy  was  adopted  to  define  the  central  inventory  number  for  the  officer  force  at 
disaggregated  levels  for  any  cells  or  collection  of  cells. 

1.  Let  t  =  1 6,  refer  to  the  year  1977 . 19S2 

2.  L. :  Y(t)  =  Number  of  losses  in  year  t 

3.  Let  INV(t)  =  Inventory  in  the  beginning  of  year  t 


7  FITTED 

A  THIS  FUNCTION  IS  FOR  THE  CALCULATION  OF  THE 
A  COEFFICIENS, FITTED  VALUES  OF  THE  LOGISTIC 
A  REGRESSION. 

'  ENTER  MOS ' 

MOS+Q 
'  ENTER  LOS ' 

LOS+  0 
*  ENTER  GR ' 

GR+D 

INV1+IEST  [  ( 1 +M0S  )  s  <  1 +LOS  )  :  ( 1 +G3? )  ] 
LOSSl+LESTLll+MOS) i Xl+LOS)i (1+Cfl)] 

AN-p  ( .  INV1 ) 

X+$((3,X)p(Xpl), (,S((<pGR),(pLOS))pLOS)), (XpGR)) 

Xl+X 

EP-*-lE  8 

NU-lK,l)p(,INVl) 

Yl+{K,l)p(..LOSSl ) 

Xl+J/Xl 

Nl+J/Nl 

Yl+JSYl 

BETA+iX  l+(pXl)),l)pO 
L  2  :  SETA  1 +-BETA 
TETHAT+X 1 + . xfi£TA 

S+Y 1  -Wl xPffAIV  ( (*TETHAT  ) *  ( 1  +  ( ★Z’EI’ffAZ’ )  )  ) 

V1+  ( 1V1  x  (  arm/AP )  )  *  (  ( 1  +  (  *TETHAT  )  )  *  2  ) 

AT«-pV+"  •  VI 

V+i((N,N)pV))x(iN)<>.  =  (iN) 

BETA+BETA  + (((&((  (W1  )  +  .  xy  )  + .  xXl  )  )  + .  x  (fc>Xl  )  )  + .  xS  ) 


)p(,INV  1) 
)p(.L0SSl) 
)*0  ) 


-»-L  2  x  i  EP<  |  fl-p.BEM 
J^(iiV)o.  =  (i/V) 

^U(??2-5Hi£xU+-x$i(((^P+*x7>+-xX1>>> 

MW-((B+.x(*Xl))  +  .x(y*0.5)) 

MD++/1  ((ilV)«.  =  (ilv)  }xMl ) 
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Figure  A. 8  Apl  function  FITTED. 


Let  N(t)  =.  Maximum  of  Y(t)  and  the  average  inventory  usine  the  beeinning 
inventoiy  in  year  t  and  t-t- 1  and  computing  their  avaraee  (INVit) 
+  INV(t+ l))/2.  N(t)  is  the  central  inventorv  of  vear  t.  This  will  provide  the 
elements  for  a  more  accurate  estimation  of 'the  attrition  rate  on  the 
disaggregated  level. 


3.  LOGISTIC  REGGRESSION  AND  VALIDATION  FUNCTIONS 

The  following  APL  functions  were  utilized  for  the  logistic  regression  and  the 
validation  of  the  model.  These  functions  must  be  in  the  same  APL  workspace.  Also, 
they  use  the  global  variables;  IEST,  LEST,  IVAL81,  IVAL82.  IVAL83,  LVAL81, 
LVALS2  and  VAL83  which  are  the  output  of  the  function  GETDATA. 


V  RESIDUAL 

A  THIS  FUNCTION  IS  FOR  THE  CALCULATION  OF  THE 
A  RESIDUAL  VECTORS  OF  THE  REGRESSION . 
ff<-(  .YlzO)Af(,Yl)*,lVl) 

NH+k/,N 1 
Ytf«-ff/,Yl 
P1+YH*NH 

TETHA+toiPl  *  ( 1  -PI  )  ) 

DEV-*-  2*  (TH-TETHA  ) 

DEVI*-  ( pTETHAT  )pH\DEV 

U-*-  ,11-0 

NU+U/ ,N1 

PHATU+U / ,PHAT 

A 1*  2*NUx(®M+(l-PHATU )) 

AH-CprErffADptfUi 

Z«-(,Y1)=,JV1 

JVZ«-Z/,1V1 

PffArZ-«-Z/,PHA2’ 

A2«-  2*NZx(®PHATZ ) 

A2-e(pZ’E™Z,)pZ\A2 

PP7«*-DP72+A2 

0«-+/(jD£V) 

C1^£>P7<0 

C2+0P72O 

0EF«-(C2-Cl)x((  DP7)*0.5) 

TEPA  «-  (  p  TETHA  T)pH\  TETHA 
VAR+NlxPHAT *  <  1  -PHA21 ) 

CffI«-+/((S*2)*VAP) 

CPJCOM-*-S+(7AP*0.5) 
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Figure  A. 9  Apl  Function  RESIDUAL. 


a.  Function  LOGISTIC 


APL  function  LOGISTIC  in  Figure  A. 7  is  the  main  function  for  the  regression 
and  validation  calculations.  This  function  calls  FITTED,  RESIDUAL  and  the 


VALIDATION  functions.  These  functions  cannot  be  run  alone.  Thev  must  be  run  bv 


the  function  LOGISTIC.  In  other  words,  they  are  just  the  subfunctions  of  the  main 
function  LOGISTIC.  These  subfunctions  will  be  discussed  following. 

b.  Function  FITTED 

APL  function  FITTED  in  Figure  A. 8  finds  the  fitted  values  of  the  regression. 
This  function  uses  global  variables  "IEST"  and  "LEST". 


c.  Function  RESIDUAL 


APL  function  RESIDUAL  in  Figure  A. 9  calculates  the  array  of  the  residuals. 
This  function  is  just  the  continuation  of  the  function  FITTED. 


filesect  Function  VALIDATION 
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7  VALIDATION 

n  THIS  FUNCTION  IS  FOR  THE  CALCULATION  OF  THE 
a  CHI-SQUARE  ST  AT .  (CHISQ  )  FOR  THE  FISCAL  YEARS 
A  FROM  1981  TO  1983. 

CHISQ+3 pO 
1+ 1 

INV2+IVAL8 1 C ( 1 +MOS ) :  <  1 +LOS ) : ( 1 +CP ) ] 

LOSS2+LVAL8 1 C ( 1 +M0S ) j ( 1 +LOS ) ; ( 1 +CP ) ] 

->L10 

£4 :  2W 2«-I7AL8  2  [  ( 1 +M0S ) :  ( 1 +LOS  )  :  ( 1 +CR  )  ] 
LOSS2+LVAL82[(l+MOS); (1+LOS); (1+CR)J 
+L10 

L5:INV2+IVAL83[(l+MOS); (1+LOS); (1+GR)] 
LOSS2-*-LVAL83  [  (1+MOS ) ;  ( 1+LOS  )  ;  (1+GP)] 

L10:Tl+( ,INV 2*0) 

NT1+T1/ ,INV2 
YT1+T1/ .LOSS2 
P+YT1+NT1 
P+(K,l)pTl\P 
N2<-(K,l)p(tINV2 ) 

PHATl+lk, 1 )pJ\ ( .PPAP) 
c<-(PMn*o )  a  (pp4ri*i ) 

7iN-+/P 

rPJ5|  [I]  -*■+/  (  (  (PH AT  1  -P  )  *  2  )  Xtf  2  XD  )  *  (PffATl  *  ( 1  -PflAPl  )  ) 

-*\l-2  )/L4 
-*■(1=3  )/L5 
DEF+N-3 
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Figure  A.  10  Apl  Function  VALIDATION. 

APL  function  VALIDATION  in  Figure  A.  10  calculates  the  Chi-Square 
statistics  for  the  fiscal  years  from  1981  to  1983.  This  function  uses  global  variables 
IVALXX  and  LVALXX  where  "XX”  are  the  fiscal  years  from  1981  to  1983. 
d.  Description  of  the  output  variables 

In  this  section,  we  will  describe  the  output  variables  which  are  used  in  the 
APL  functions. 

BETA  :  vector  of  the  regression  coefficients 

TETF1A  :  vector  of  logit(p)  where  p  =  y/n 

TETHAT  :  vector  of  fitted  values 

DEV  :  vector  of  components  of  the  deviance 

CHICOM  :  vector  of  individual  components  of  x2 

MD  :  vector  of  diagonal  elements  of  projection  matrix 

CHI  :  the  chi-squared  goodness  of  fit  statistic  for  estimation  years 

D  :  total  deviance 


Jt*  Jt,*  .ir.U*  .1 A '  a.l  U^l.^  VjV 


CHISQ  :  the  vector  of  chi-squared  test  statistic  for  validation  years 
DEF  :  degrees  of  freedom 
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APPENDIX  B 
GRAPHS 


This  appendix  contains  graphical  illustration  of  the  fitting  for  the  estimation  of 
USMC  officer  attrition  rates.  Some  cases  were  selected  from  the  L'SMC  manpower 
data  to  illustrate  whether  logistic  regression  model  fit  well  the  data  or  not.  Each  case 
has  its  own  regression.  From  Figure  B.l  through  the  Figure  B.8.  for  each  case, 
following  plots  are  showed. 

1.  logistic  probability  plot  of  components  of  the  deviance 

2.  logistic  probability  plot  of  components  of  the  chi-square 

3.  scatter  plot  of  fitted  values  vs  components  of  the  deviance 

4.  scatter  plot  of  fitted  values  vs  components  of  the  chi-square 


MOS  =  3,  0£LOS£6,  4£GR£6 

LOGISTIC  PROB  PLOT  OF  COMPONENTS  OF  DEVIANCE  LOGISTIC  PROS  PLOT  OF  COMPONENTS  OF  CHI-SQUARE 


MOS  =  13,  0^L0S^6,  4£GR£6 

LOGISTIC  PROS  PLOT  Of  COMPONENTS  Of  DEVIANCE  LOGISTIC  PROS  PLOT  Of  COMPONENTS  Of  CHI-SQUARE 


Figure  B.3  Illustration  of  fitting  for  MOS  =  13,  LOS  »  0-6,  GR  *  4-6. 


PITTED  VALUES  FITTED  VALUES 


MOS  =  20,  0£L0S<i6,  4£GR£6 

LOGISTIC  PROB  PLOT  OF  COMPONENTS  OF  DEVIANCE  LOGISTIC  PROB  PLOT  OF  COMPONENTS  OF  CHI-SQUARE 
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VALUES  FITTEO  VALUES 


MOS  =  3,  19£LOS£29,  7^GR£9 

LOGISTIC  PROB  PLOT  OF  COMPONENTS  OF  DEVIANCE  LOGISTIC  PROS  PLOT  OF  COMPONENTS  OF  CHI- SQUARE 


Figure  B.5  Illustration  of  fitting  for  MOS 


19-29,  GR 


FITTED  VALUES  FITTED  VALUES 


MOb  =  20,  iy^LOb£2y,  7^GK^9 

LOGISTIC  PROS  PLOT  OP  COMPONENTS  OF  DEVIANCE  LOGISTIC  PROB  PLOT  OF  COMPONENTS  OF  CHI-SQUARE 
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Figure  B.8  Illustration  of  fitting  for  MOS 


20,  LOS  -  19-29,  GR  =  7-9. 
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